Dissociation and Propagation for Efficient Query Evaluation over Probabilistic Databases

نویسندگان

  • Wolfgang Gatterbauer
  • Abhay Jha
  • Dan Suciu
چکیده

Queries over probabilistic databases are either safe, in which case they can be evaluated entirely in a relational database engine, or unsafe, in which case they need to be evaluated with a general-purpose inference engine at a high cost. We propose a new approach by which every query is evaluated inside the database engine, by using a new method called dissociation. A dissociated query is obtained by adding extraneous variables to some atoms until the query becomes safe. We show that the probability of the original query and that of the dissociated query correspond to two well-known scoring functions on graphs, namely graph reliability (which is #P-hard), and the propagation score (which is related to PageRank and is in PTIME): When restricted to graphs, standard query probability is graph reliability, while the dissociated probability is the propagation score. We define a propagation score for self-join-free conjunctive queries and prove that it is always an upper bound for query reliability, and that both scores coincide for all safe queries. Given the widespread and successful use of graph propagation methods in practice, we argue for the dissociation method as a highly efficient way to rank probabilistic query results, especially for those queries which are highly intractable for exact probabilistic inference.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Databases with MarkoViews

Most of the work on query evaluation in probabilistic databases has focused on the simple tuple-independent data model, where tuples are independent random events. Several efficient query evaluation techniques exists in this setting, such as safe plans, algorithms based on OBDDs, treedecomposition and a variety of approximation algorithms. However, complex data analytics tasks often require com...

متن کامل

Read-Once Functions and Query Evaluation in Probabilistic Databases

Probabilistic databases hold promise of being a viable means for large-scale uncertainty management, increasingly needed in a number of real world applications domains. However, query evaluation in probabilistic databases remains a computational challenge. Prior work on efficient exact query evaluation in probabilistic databases has largely concentrated on query-centric formulations (e.g., safe...

متن کامل

Scalable Statistical Modeling and Query Processing over Large Scale Uncertain Databases

Title of Dissertation: SCALABLE STATISTICAL MODELING AND QUERY PROCESSING OVER LARGE SCALE UNCERTAIN DATABASES Bhargav Kanagal Shamanna Doctor of Philosophy, 2011 Dissertation directed by: Dr. Amol Deshpande Dept. of Computer Science The past decade has witnessed a large number of novel applications that generate imprecise, uncertain and incomplete data. Examples include monitoring infrastructu...

متن کامل

Efficient Query Evaluation over Temporally Correlated Probabilistic Streams

Many real world applications such as sensor networks and other monitoring applications naturally generate probabilistic streams that are highly correlated in both time and space. Query processing over such streaming data must be cognizant of these correlations, since they significantly alter the final query results. Several prior works have suggested approaches to handling correlations in proba...

متن کامل

Graphical Models for Uncertain Data

Graphical models are a popular and well-studied framework for compact representation of a joint probability distribution over a large number of interdependent variables, and for efficient reasoning about such a distribution. They have been proven useful in a wide range of domains from natural language processing to computer vision to bioinformatics. In this chapter, we present an approach to us...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010